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Abstract 

We present a novel error correcting code and decoding algorithm which have 
construction similar to expander codes. The code is based on a bipartite graph 
derived from the subsumption relations of finite projective geometry, and Reed- 
Solomon codes as component codes. We use a modified version of well-known 
Zemor's decoding algorithm for expander codes, for decoding our codes. By 
derivation of geometric bounds rather than eigenvalue bounds, it has been proved 
that for practical values of the code rate, the random error correction capability 
of our codes is much better than those derived for previously studied graph 
codes, including Zemor's bound. MATLAB simulations further reveal that the 
average case performance of this code is 10 times better than these geometric 
bounds obtained, in almost 99% of the test cases. By exploiting the symmetry 
of projective space lattices, we have designed a corresponding decoder that has 
optimal throughput. The decoder design has been prototyped on Xilinx Virtex 
5 FPGA. The codes are designed for potential applications in secondary storage 
media. As an application, we also discuss usage of these codes to improve the 
burst error correction capability of CD-ROM decoder. 
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1 Introduction 

Graph codes have been studied and analyzed in past, in order to try and find codes that 
give good error correction capability at high code rates p], [2], [3]. At the same time, 
from a practical implementation point of view, such codes require to have decoding 
and encoding algorithms that are efficient in terms of speed and hardware complexity. 
Expander Codes, first suggested by Sipser and Spielman [I] , proved to be theoretically 
capable of providing asymptotically good codes that were decodable in logarithmic 
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time and could be implemented with a circuit whose size grew linearly with code size. 
These codes are constructed using a special graph, known as expander graph, and by 
embedding identical component codes at the nodes of this graph. [3] analyzed the 
properties of expander codes, and specified lower bound on the number of errors that 
will always be corrected by one decoding algorithm. Zemor, in his analysis in [3J, 
suggested using bipartite Ramanujan graphs for constructing expander codes. He also 
provided a decoding algorithm which improved the lower bound in |3J by a factor 
of 12. In [I], Hoholdt and Justesen built on the work of Tanner on graph codes, 
by suggesting the use of Reed Solomon codes as sub-codes for graphs derived from 
point-line incidence relations of projective planes. The decoding speed and ease of 
implementation, combined with error-correction performance that was scalable with 
increasing graph size made all these codes interesting, while considering applications 
related to secondary storage. 

By definition, the sub-code length should remain constant, as the order of (bipartite) 
expander graph increases, during construction of a family of expander codes. For 
performance reasons described later, we choose to increase the sub-code length as 
well. Hence our codes may not be called expander codes, but just graph-based, or 
expander-like codes. The presented work thus deals with the construction and analysis 
of an expander-like code, which is based on a special bipartite Ramanujan graph. This 
bipartite graph is derived from point-hyperplane incidence relations of projective spaces 
of higher dimensions than those suggested by pQ. We look at various properties of 
the codes, and in the process come across several generic interesting properties of 
projective geometry. Also, we wanted the codes to be practically useful. Hence, in 
a companion paper, we present throughput-optimal VLSI design of decoder for such 
codes [5]. 

For decoding, we employ a variation of Zemor's algorithm. By simulations using this 
algorithm, we found that the codes have excellent robustness to random as well as 
burst errors. Hence we envisage their application in data storage systems. 
The next section provides the basic properties of cardinalities of projective geometry 
that we will be using. Section 3 gives relevant background information for the various 
concepts required in this paper. Section 4 describes our code construction in detail. 
Sections 5-11 give the characterization of error-correction capability for these codes, 
including proofs of propositions relating to bounds on error correction capacity. The 
remaining sections detail out the prototyping results, and also the application to two 
types of storage media (namely CD-ROMs and DVD-R), before concluding the paper. 

2 Finite Projective Spaces 

In this section, we look at how finite projective spaces are generated from finite fields. 
For more details, refer [6]. Consider a finite field F = GF(s) with s elements, where s is 
a power of a prime number p i.e. s = p h , k being a positive integer. A projective space 
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of dimension d is denoted by F(d, F) and consists of one-dimensional subspaces of the 
(d + l)-dimensional vector space over F (an extension field over F), denoted by ¥ d+l . 
Elements of this vector space are of the form (xi, . . . , Xd+i), where each xi G F. The 
total number of such elements are = p k ( d+1 ) . An equivalence relation between 

these elements is defined as follows. Two non-zero elements x, y are equivalent if there 
exists an element A G GF(s) such that x = Ay. Clearly, each equivalence class consists 
of s elements of the field (s — 1 non-zero elements and 0), and forms a one-dimensional 
subspace. Such 1-d vector subspace corresponds to a point in the projective space. 
Points are the zero-dimensional subspaces of the projective space. Therefore, the total 
number of points in P(d, F) are 

# non-zero elements in the field 

^ non-zero elements in one equivalence class 
d+i _ 1 

(2) 



s - 1 

An m-dimensional subspace of P(c?, F) consists of all the one-dimensional subspaces of 
an (m + l)-dimensional subspace of the vector space. The basis of this vector subspace 
will have (m + 1) linearly independent elements, say bo, ... , b m . Every element of this 
subspace can be represented as a linear combination of these basis vectors. 



where «j G F(s) (3) 



i=0 



Clearly, the number of elements in the vector subspace are s( m+1 ). The number of points 
in the m-dimensional projective subspace is given by P(m) defined in earlier equation. 
Various properties such as degree etc. of a m-dimensional projective subspace remain 
same, when this subspace is bijectively mapped to some (d — m — l)-dimensional 
projective subspace. The two sets of these subspaces, one for each dimension, are 
said to be dual of each other . The number of d-dimensional projective subspaces of a 
m-dimensional projective space can be counted using the Gaussian Coefficient. 

( g "+i-l)( s "-l),,,( s ^+i-l) 

For < I < m < d, the number of 1-dimensional projective subspaces contained in an 
m-dimensional projective subspace is 0(m,/,s), while the number of m-dimensional 
projective subspaces containing a particular 1-dimensional projective subspace is <fi(d — 
I — 1, m — I — 1, s). 



2.1 Projective Spaces as Lattices 

It is a well-known fact that the lattice of subspaces in any projective space is a mod- 
ular, geometric lattice [S]. A projective space of dimension 2 is shown in figure 
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lb In the figure, the top-most node represents the supremum, which is a projective 
space of dimension m in a lattice for P(m, GF(q)). The bottom-most node represents 
the infimum, which is a projective space of (notational) dimension -1. Each node in 
the lattice as such is a projective subspace, called a flat. Each horizontal level of 
flats represents a collection of all projective subspaces of P(m, GF(q)) of a particular 
dimension. For example, the first level of flats above infimum are flats of dimension 
0, the next level are flats of dimension 1, and so on. Some levels have special names. 
The flats of dimension are called points, flats of dimension 1 are called lines, flats 
of dimension 2 are called planes, and flats of dimension (m-1) in an overall projective 
space P(m, GF(q)) are called hyperplanes. 




Figure 1: An Example PG Bipartite Graph 



3 Expander Codes 

Expander codes are a family of asymptotically good, linear error-correcting codes [I]. 
They can be decoded in sub-linear time(proportional to log(n), where n is length 
of codeword) using parallel decoding algorithms. Further, this can be achieved using 
identical component decoders, whose count is proportional to n. These codes are based 
on a class of graphs known as expander graphs. One construction of expander graph, 
used in construction of expander codes, is by considering the edge-vertex incidence 
graph Z of a d- regular graph G. The edge- vertex incidence graph of G = (V, E), a 
(2,d)-regular bipartite graph, has vertex set E U V and edge set 

{(e, v) G E x V : v is an endpoint of e} 

Vertices of Z corresponding to edges E of G are then associated to variables, while 
vertices of Z corresponding to vertices of G are associated to constraints on these 
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variables. Each constraint corresponds to a set of linear restrictions on the d variables 
that are its neighbors. In particular, a constraint will require that the variables it 
restricts form a codeword in some linear code of length d. This constraint forms the 
sub-code for that expander code. Further, all the constraints are required to impose 
isomorphic codes (on different variables, of course). The default construction of a 
family of expander codes requires d to remain constant, as the order of G increases. 
Formally, Let Z be a (2,d)-regular graph between set of n nodes called variables, and 
|n nodes called constraints. Let z(i,j) be a function such that for each constraint Cj, 
the variables neighboring Ci are f 2 (j,i),-- - ,v z ^ t d). Let § be an error-correcting code 
of block length d. The expander code C(Z,§) is the code of block length n whose 
codewords are the words (xi, • ■ ■ , x n ) such that, for 1 < i < |n, x^i), • • • , x z ^ is a 
codeword of S. 

3.1 Expander Graphs 

An expander graph is a graph in which every set of vertices has an unusually large 
number of neighbors. More formally, 

Let G = (V, E) be a graph with n vertices. Then the graph G is a ^-expander, if every 
set of at most m vertices expands by a factor of S. That is, 

VS C V : \S\ < m \{y : 3x G S s.t. (x,y) G E}\ > 5 ■ \S\ 

Expander codes being a subclass of LDPC codes, for whom efficient iterative decoding 
using variables and constraints a bipartite graph is feasible, we are interested mainly 
in bipartite expander graphs. 

The degree of "goodness" of expansion, especially for regular graphs, can also be 
measured using its eigenvalues. The largest eigenvalue of a k-regular graph is k. If the 
second largest eigenvalue is much smaller from the first (k), then the graph is known 
to be a good expander [I]. 

3.2 Good Expander Codes 

As pointed out earlier, the decoding algorithm for such codes is iterative. Hence good 
expander codes imply at least the following properties. 

• Better minimum distance than other codes of same length, 

• Fast convergence, and 

• Better code rate than other codes in the same class 

Good codes having the above properties can be identified with help of three theorems 
proved in jl]. For the theorems, we assume that an expander code C(Z, §) has been 
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constructed having § as a linear code of rate r, block length d, and minimum distance 
e, and Z as the edge- vertex incidence graph of a d-regular graph G with second-largest 
eigenvalue A. 

Theorem 1. The code C(Z, §) has rate at least 2r - 1, and minimum relative distance 
at least (jrk^j ■ 

Theorem 2. If a parallel decoding round for C(Z, §) ; as given in j^j, is given as input 
a word of relative distance a from a codeword, then it will output a word of relative 
distance at most a (| + ^ + ^) from that codeword. 

Theorem 3. For all e such that 1- 2H(e) > 0, where H(-) is the binary entropy 
function, there exists a polynomial-time constructible family of expander codes of rate 
1 - 2H(e) and minimum relative distance arbitrarily close to e 2 in which any a < e 2 /48 
fraction of error can be corrected by a circuit of size 0(n log n) and depth 0(log n). 

From theorem [TJ we observe that to have high minimum relative distance, we should 
have e as high, and h as l° w - Since Z has been constructed out of d-regular graph G, 
low ^ signifies high distance between first and second eigenvalues, i.e. the graph G has 
to be a "good" expander graph. Further, to have high rate, § has to have a high rate 
r as well, other than having high minimum relative distance e. 

From theorem[2j we observe that to shrink the distance of input word after one iteration 
maximally, we need to again have e as high as possible, and 4 as low as possible. Such 
maximal shrinking of distance, per iteration, leads to the fastest convergence possible, 
as is also brought out in the proof of theorem [3j 

From theorem [3j we observe that to be able to correct as high fraction of errors as 
possible, we need to have e as high as possible, again. 

As an aside, 4 is low whenever d is high. In PG-based bipartite graphs, d does increase, 
as n increases (hence expander-like codes), which hence helps in making the overall 
code advantageous over classical expander codes. 

3.3 Good Expander Graphs 

Zemor pointed out |3] that if G is a bipartite graph, then the % of random errors 
that can be corrected using a parallel iterative decoding algorithm can be increased 
twelve-fold. He also pointed out that the upper bound on minimum distance, as 
pointed out in theorem [TJ can also be achieved faster, if one considers Ramanujan 
graphs (since 4 value is low for these graphs). Overall, he suggested using bipartite 
Ramanujan graphs for construction of good expander codes instead. 
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4 Details of Expander-like Code 



4.1 PG Graphs as Good Expander Graphs 

Balanced regular bipartite graphs Gd,d, which are symmetric balanced incomplete block 
designs (BIBDs) are known to be Ramanujan graphs [2]. Incidence relations of pro- 
jective geometry structures give such BIBDs, and hence Ramanujan graphs. Usage of 
projective plane as Gd,d along with RS codes as component codes to construct good 
expander-like graph codes was first reported in pQ . For our work, we do not limit to 
projective planes: to have better performance, we have made use of point-hyperplane 
incidence graphs from higher dimensional projective spaces, which also satisfy the 
eigenvalue properties that make a Ramanujan graph. Some reasons for using projec- 
tive geometry are as follows. 

• As detailed in a companion paper [5], the mapping of vertices to points and 
hyperplanes enables us to use several projective geometry properties for disprov- 
ing the existence of certain bipartite subgraphs of a fixed minimum degree. This 
strategy leads us to finding the minimum number of vertices required to form a 
complete bipartite subgraph of a given minimum degree. This number of vertices 
is required to calculate tight geometric bounds for error correction capability 
of the overall code. Thus, we don't need to use complicated eigenvalue arguments 
used by [I] and [3]. Also, the bounds obtained in this manner are better than our 
predecessors. Furthermore, Zemor had restricted the subcodes to be constrained 
by d > 3A, A being the second largest eigenvalue of the graph. We have no such 
restriction. 

• The use of projective geometry also helps in developing a perfect folded architec- 



ture of the decoder for hardware implementation, discussed later in section 10 
This particular folding enables efficient utilization of processors and memories, 
while being throughput-optimal. 

4.2 Reed-Solomon Codes as Good Component Codes 

By choosing a "good expander" graph, and fixing a code with high minimum relative 



distance e, one can design code having the first two properties described in section 3.2 
To simultaneously have high code rate for C(Z, §), the component code § also needs 
to have high rate r. Reed-Solomon codes are a class of non-binary, linear codes, which 
for a given rate, have the best minimum relative distance (so-called maximum distance 
separable codes), and vice- versa. Hence we use RS codes as the sub-codes to our 
expander-like codes. 
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4.3 Code Construction 

To construct an expander-like code, we follow |3J. We generate a balanced regu- 
lar bipartite graph G from a projective space. A projective space of dimension n 
over GF(2), P(n, GF(g)), has at least following two properties, arising out of inherent 
duality: 

1. The number of subspaces of dimension m is equal to the number of subspaces of 
dimension (n — m — 1). 

2. The number of m-dimensional subspaces incident on each (n— m — l)-dimensional 
subspace is equal to the number of (n — m — l)-dimensional subspaces incident 
on each m-dimensional subspace. 

We associate one vertex of the graph with each m-dimensional subspace and one with 
each (n — m — l)-dimensional subspace. Two vertices are connected by an edge if 
the corresponding subspaces are incident on each other. As edges lie only between 
subspaces of different dimensions, the graph is bipartite with vertices associated with 
m-dimensional subspaces forming one set and vertices associated with (n — m — 1)- 
dimensional subspaces forming another. Also, the two properties, listed above, ensure 
that both the vertex sets have the same number of elements and that each vertex has 
the same degree. To be able to quantify various properties of the constructed code, we 
hereafter specifically consider the graph, G = (V, E), obtained by taking the points and 
hyperplanes of P(5, GF(2)). In this projective space, the number of points (= number 
of hyperplanes) is 0(5, 0, 2) = 63. Each point is incident on 0(4, 3, 2) = 31 hyperplanes 
and each hyperplane has 0(4,0,2) = 31 points. Therefore, we have \V\ = 126 and 
\E\ = 1953. This implies that the block length of code C is 1953 and the number 
of constraints in the code is 126. The second eigenvalue of G, A is 4, according to 
a formulae by [7j. Hence the ratio ^ is quite small, as required for design of "good" 
expander codes. 

As the expander graph G is 31-regular, the block length of the component code must 
also be 31 [I]. We choose the 31-symbol shortened Reed Solomon code as the component 
code, with each symbol consisting of eight bits. To have performance advantage, we 
also modify Zemor's decoding algorithm j3] as follows. If a particular vertex detects 
more errors than it can correct, we skip the decoding for that vertex. In Zemor's 
algorithm, the decoding is still carried out in such case, which can lead to possibly 
getting a (different) codeword with more errors. The modification is possible because 
it is possible to compute, as a side output using Berlekamp-Massey's algorithm for RS 
decoding[8], whether the degree of errors in the current input block of symbols to the 
decoder be corrected or not. If not, then the algorithm can be made to skip decoding, 
thus preserving the errors in the input block. This variation in decoding will reduce the 
number of extra errors introduced by that vertex if the decoding fails. Based on this 
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decoding algorithm, a MATLAB model of decoder was first made, to observe code's 
performance as discussed next. 

5 Performance of Code for Random Errors 

To benchmark the error- correct ion performance in wake on random errors, we varied 
the minimum distance of the component code, and simulated the MATLAB decoder 
model. Random symbol errors were introduced at random locations of the zero vector. 
Convergence of the decoder's output back to the zero vector was checked after simula- 
tion. As our code is linear, the performance obtained in testing for zero vector is valid 
for the entire code. Since the errors were introduced at random locations, simulations 
were run over many different rounds of decoding for different pseudo-random sequences 
as inputs, and averaged, to get reliable results. These sequences differ in random posi- 
tions in which the errors are introduced. Each round of decoding for particular input 
further involves several iterations of execution of decoding algorithm. One iteration 
of decoding corresponds to both sides of the bipartite graph to finish decoding the 
component codes. 

It is observed experimentally that in case of a decoding failure, beyond 4 iterations, 
the number of errors in the codeword stabilizes(referred to as fixed point in [9]). In the 
first few iterations, as the number of errors decrease in the overall code in a particular 
iteration, the number of errors on average to be handled by RS decoders in next iter- 
ation is lesser. Hence probabilistically, and experimentally, these component decoders 
converge faster, thus increasing the percentage of errors being corrected in its next 
iteration. However, after maximum 4 iterations, it was seen that there is no further 
reduction in errors. This phenomenon could most probably be attributed to infinite 
oscillations of errors in an embedded subgraph, to be described in next section. 
Hence we have fixed the stopping threshold of decoder to exactly 4 iterations not only 
in simulation, but also in the practical design of a CD-ROM decoder. In simulation, if 
there are non-zero entries remaining after 4 iterations, then the decoding is considered 
to have failed. In real life, if one or more component RS decoders fail to converge at 4th 
iteration, then again decoding is considered to have failed. The results of our simula- 
tions are presented in Tables [T] and [2] The component codes used for these simulations 
have minimum distance as 5 and 7, respectively. The "failures" column represents the 
percentage of failed decoding attempts. The "average number of iterations" column 
signifies the average number of iterations required for successful decoding of a corrupt 
codeword, over various rounds. 

We present some worst-case bounds on rate and error- correct ion capability of our codes 
in Table |3j We vary the minimum distance of subcode between 3 and 15. Beyond 15, 
rate of the overall code C becomes very less and hence impractical. We also compare 
these bounds to the bounds derived by Zemor. For calculating the Zemor bounds and 
making a fair comparison, we need to remove the advantage of using Reed Solomon 
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No. of errors 


failures 


Avg. no. of iterations 


50 





1 


80 


1 


1.71 


100 


18 


2.33 


110 


40 


2.72 


Table 1: Random errors (e = 5) 


No. of errors 


failures 


Avg. no. of iterations 


150 





1.6 


175 





1.99 


200 





2.19 


250 


23 


3.82 


275 


64 


4.5 



Table 2: Random errors (e = 7) 



codes as sub-codes. Zemor had derived the bounds for general codes assuming that 
> | errors could not be corrected for any distance(even/odd). For Reed Solomon 
component codes used in our construction, since we use only odd distances, errors 
can never be corrected. To account for this, we replace e/2 by (e + l)/2 in Zemor's 
formula to calculate the bounds. 



Min. dist. 


Subcode 


Lower bound 


Error- correct ing 


Zemor's 


subcode (e) 


rate 


on rate of C 


capability 


bound 


(e) 






ofC 


for C 


3 


0.94 


0.87 


3 




5 


0.87 


0.74 


8 




7 


0.81 


0.61 


15 




9 


0.74 


0.48 


24 




11 


0.68 


0.35 


35 




13 


0.61 


0.23 


48 


42 


15 


0.55 


0.1 


87 


65 



Table 3: Change in parameters of C with variation in minimum distance of subcode 

We give a geometrical analysis of process of error correction in the overall code C. 
We have also used results from this analysis to derive the bounds on error correction 
capability of C. First of all, since we have P(5,GF(2)), points form the 0-dimensional 
subspace and hyperplanes form the 4-dimensional subspace. Moreover, planes form 
2-dimensional subspaces of the projective space, and are symmetric with respect to 
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points and hyperplanes. Finally, 7 points are contained in a plane and a plane is 
contained in 7 hyperplanes in P(5, GF(2)). 

To understand the limits, given the minimum distance e of the subcode, we need to 
find the minimum number of random errors to be introduced in C, which will cause 
the decoding to fail. This will happen if the vertices corresponding to the points and 
hyperplanes get locked in such a way that in each iteration, an equal number of 
constraints fail on each side. This is the minimal configuration of failure. As explained 
next, errors can expand over iterations (more edges represent corrupt symbols), but 
that is not the minimum configuration of failure. Similarly, if errors shrink, i.e. lesser 
number of vertices in bipartite graph fail in next iteration, then it leads to a case of 
decoding convergence, not decoding failure. 

For example, if we consider e = 5, each vertex of the graph can correct up to 2 erroneous 
symbols(LfJ) in the set of symbols that it is decoding. If 3 or more erroneous symbols 
are given to it, then either the decoder, based on Berlekamp-Massey's algorithm, skips 
decoding, or it outputs another codeword that in worst case has at least e different 
symbols now(than the transmitted codeword), and hence at least e errors. However, as 
discussed earlier, we are not interested in the latter case. So, if we can generate a case in 
which decoding of the sub-code fails at vertices corresponding to 3 points, all of which 
are incident on 3 hyperplanes, we have a situation in which the 3 points will transfer 
at least 3 errors to each of vertices corresponding to the 3 hyperplanes. These vertices 
may also fail, or decode a different codeword, while decoding their inputs. Again in the 
worst case each of these hyperplane decoders will output at least 3 erroneous symbols. 
These corrupt symbols, or errors, are then transferred back to the vertices representing 
the 3 points. Thus, the errors will keep oscillating infinitely from one side of the 
graph to the other, and the decoder will never decode the right codeword. Thus, a 
minimum of 3 * 3 = 9 errors are required to cause a failure of decoding. This can be 
seen from the figure [2j For any case in which less than 9 corrupt symbols exist, by 




9 errors will oscillate between iterations 



Figure 2: Subgraph that causes failure (e = 5) 

pigeonhole principle, we will have at least one hyperplane or point having less than 
3 errors incident on it. Decoder corresponding to that vertex will correctly decode 
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the sub-code, thus reducing the total number of errors flowing in the overall decoder 
system of C. This will, in next iteration, cause some other hyperplane or point to have 
less than 3 errors. Thus in the subsequent iterations, all the errors will definitely be 
removed. Therefore, 8 errors or less will always be corrected. This has been illustrated 
in figure [3j As we can see from Table [TJ the worst case scenario is very unlikely to 





Corruixed iuljyrjph 



After points decode 




Non-corrupt^d!-;e 



After Hyperplanes decode 



Figure 3: 8 or less errors always corrected 



occur and for randomly placed errors, even 80 errors are found to be corrected 99% of 
the time. 

Now the question is, when can we find a configuration in which 3 points are all incident 
on 3 hyperplanes? If we choose any plane in the given geometry, we can pick up any 
three points of that plane and find any 3 hyperplanes corresponding to the same plane. 
This will ensure that all the 3 points are incident on all the 3 hyperplanes. Thus, if 
for some input, the decoding at these 3 points fails, in the worst case they will corrupt 
all the edges incident on them. This in turn would cause 3 errors each, on the chosen 
hyperplanes. Hence the errors would oscillate between points and hyperplanes for each 
successive iteration. Thus, in the worst case, there need to be 9 erroneous symbols, 
located such that they are incident on the 3 chosen points, to cause the decoding of C 
to fail. 

In general, if we are given a minimum distance e of the subcode, it is known that at 
each vertex, more than (^-) errors will not be corrected. So, in our graph we need 
to find the minimum number of vertices £ required to get a embedded bipartite 
subgraph such that each vertex in the subgraph has a degree of at least (^) towards 
vertices on other side of the subgraph. Once this number of vertices in some embedded 
subgraph has been found, the number of errors that can always be corrected by our 
decoder is given by: 
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In P(5, GF(2)), a plane has 7 points, and is contained in 7 hyperplanes. For 3 < 
e < 13, e being odd, the minimum number of vertices £ corresponds to (^), and 
the corresponding points and hyperplanes can be picked from any plane. For e > 15, 
the calculation of £ is non-trivial, since points and hyperplanes from one plane are 
not sufficient. This is because ^ = 8, which would require us to get a subgraph 
of minimum degree 8. Construction of such embedded subgraph is not possible by 
choosing only one plane. In next few sections, we give detailed proof to show non- 
existence of a minimum degree 8 embedded bipartite subgraph having 9, 10 vertices on 
each side, within point-hyperplane graph of P(5,GF(2)). Assuming that a sub-graph 
with 11 vertices exists, having a vertex degree of at least 8 (we have not been able to 
disprove it yet), we get the bound stated in table [3] for e = 15. Since our constructions 
are exact, we can use these tight lower bounds for all practical values of e, wherever 
calculation of it is possible. Otherwise, another looser, lower bound can be found using 
Lemma 1 of [I], but derivation of that uses eigenvalue calculations. Without that, it 
is still clear from this table that the bound is much better than the bound obtained 
by Zemor[3] using eigenvalue arguments. 

5.1 Bound in case of e = 15 

Earlier in this section, we highlighted the method to analyze the error correction bounds 
for our code. We also derived the bounds for 3 < e < 13. For the complex case 
of e = 15, we present in next few sections, overview of proofs for establishing the 
non-existence of certain sub-graph embeddings which help us improve the bound, as 
reflected in table [3] The detailed proofs are presented in Appendix [Aj Appendix [B] 
presents a eigenvalue based approach (similar to Zemor's arguments). 
We first describe the two theorems, that prove the non-existence of certain minimal 
embeddings. 

Theorem 4. In the construction of bipartite graph mentioned above, there exists no 
embedded subgraph having size of partitions as 9, the degree of each of whose vertices 
has a minimum degree (8) of 8. 

Theorem 5. In the construction of bipartite graph mentioned above, there exists no 
embedded subgraph having size of partitions as 10, the degree of each of whose vertices 
has a minimum degree (8) of 8. 

6 Vector Space Representation of Geometry 

In this section, we outline certain details used in the proofs of the above theorems, in 
Appendix|Aj To recall, the points of an n-dimensional projective space over a field F can 
be taken to be the equivalence classes of nonzero vectors in the (n + l)-dimensional 
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vector space over F. Vectors in an equivalence class are all scalar multiples of one- 
another. These vector being one-dimensional subspaces, they also represent the rays 
of a vector space passing through origin. The orthogonal subspace of each such ray is 
the unique n-dimensional subspace of F n+1 , known as hyperplanes. Each vector h of 
such orthogonal subspace is linked to the ray, p, by a dot product as follows. 

Poh+pih H Ypjin = 

where pi is the i th coordinate of p. This uniqueness implies bijection, and hence a vector 
p can be used to represent a hyperplane subspace, which is exclusive of this vector as 
a point. Due to duality, similar thing can be said about a hyperplane subspace. 
Hereafter, whenever we say that two projective subspaces of same dimension are in- 
dependent, we mean the linear independence of the corresponding vector subspaces in 
the overall vector space. 

6.1 Relationship between Projective Subspaces 

Throughout the remaining paper, we will be trying to relate projective subspaces of 
various types. We define the following terms for relating projective subspaces. 

[Contained in] 

If a projective subspace X is said to be contained in another projective subspace 
Y, then the vector subspace corresponding to X is a vector subspace itself, of the 
vector subspace corresponding to Y. In terms of projective spaces, the points that 
are part of X, are also part of Y. The inverse relationship is termed 'contains', 
e.g. "Y contains X". 

[Reachable from] 

If a projective subspace X is said to be reachable from another projective sub- 
space Y, then there exists a chain(path) in the corresponding lattice diagram of 
the projective space, such that both the flats, X and Y lie on that particular 
chain. There is no directional sense in this relationship. 

7 Lemmas Used in Proof 

In this section, we describe certain useful lemmas, related to proofs of theorems [4] and 
[5] These short lemmas and their proofs provide an insight into the way detailed proofs 
were built up for theorems [4] and |5j in appendix [XJ 

7.1 Lemmas Related to Projective Space 

Lemma 6. In projective spaces over G¥(2), any subset of points (hyperplanes) having 
cardinality of 4 or more has 3 non-collinear (independent) points (hyperplanes) . 
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Proof. The underlying vector space is constructed over GF(2). Hence, any 2-dimensional 
subspace of contains the zero vector, and non-zero vectors of the form era + (3b. Here, 
a and b are linearly independent one-dimensional non-zero vectors, and a and (3 can 
be either or 1: 



Thus, any such 2-d subspace contains exactly 3 non-zero vectors. Therefore, in any 
subset of 4 or more points of a projective space over GF(2) (which represent one- 
dimensional non-zero vectors in the corresponding vector space), at least one point 
is not contained in the 2-d subspace formed by 2 randomly picked points from the 
subset. Thus in such subset, a further subset of 3 independent points(hyperplanes) i.e. 

3 non-collinear vectors can always be found. □ 

Lemma 7. Let there be 7 hyperplanes HI,-- - ,H7 reachable from a plane PI in 
PG(5,GF(2)). Let there be any other plane P2, which may or may not intersect with 
the point set of plane PI. Then, any point on P2 which is not reachable from plane 
PI, can maximally be reachable from 3 of these 7 hyperplanes, and vice-versa. Further, 
these 3 hyperplanes cannot be independent. Dually, any hyperplane containing P2, and 
not containing PI, can maximally be reachable from 3 of the 7 points contained in PI 
and which are not independent, and vice-versa. 

Proof. If a point on plane P2 which is not reachable from plane PI lies on 4 or more 
hyperplanes (out of 7) reachable from plane PI, then by lemma |6j we can always find a 
subset of 3 independent hyperplanes in this set of 4. In which case, the point will also 
be reachable from linear combination of these 3 independent hyperplanes, and hence 
to all the 7 hyperplanes which lie on plane 1. This contradicts the assumption that 
the point under consideration does not lie on plane PI. The role of planes PI and P2 
can be interchanged, as well as roles of points and hyperplanes, to prove the remaining 
alternate propositions. Hence if the point considered above lies on 3 hyperplanes 
reachable from PI, then these 3 hyperplanes cannot be independent, following the 
same argument as above. Otherwise it is indeed possible for such a point to lie on 3 
hyperplanes, e.g. in the case of the planes PI and P2 being disjoint. □ 

Lemma 8. Let there be 7 hyperplanes HI,-- - ,H7 reachable from a plane PI in 
PG(5,GF(2)). Further, let there be any other plane P2, which intersects PI in a 
line. Then, there exist 4 hyperplanes reachable from plane PI which do not contain 
any of the 4 points that are in plane P2, but not in plane PI. 

Proof. A line contains 3 points in PG(5,GF(2)). Hence, PI and P2 intersect in 3 
points. By duality arguments, they intersect in 3 hyperplanes as well. Hence, P2 
contains 7-3 = 4 points that are not common to point set of PI. By lemma [7j these 

4 points can at maximum lie on 3 hyperplanes reachable from plane PI. Since there 
are 3 hyperplanes common to PI and P2, and hence these 4 points already lie on 
them, they do not further lie on any more hyperplane reachable from PI, but not from 



a, (3 e GF(2) : (a 



B) + 0. 



P2. 



□ 
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Lemma 9. Let there be 7 hyperplanes HI,-- - ,H7 reachable from a plane PI in 
PG(5,GF(2)). Further, let there be any other plane P2, which intersects PI in a 
point. By lemma\7\ the 6 hyperplanes not containing both PI and P2 still intersect PI 
maximum in a line each. Then, (a) Such lines contain the common point to PI and 
PI, and hence exactly 2 more out of remaining 6 points of P2 that are not common to 
PI, and (b) 3 pairs of hyperplanes out of the 6 non-common hyperplanes intersect in 
a (distinct) line each out of the 3 possible lines in P2 containing the common point. 

Proof. Let A c be the common atom(point) between planes PI and P2. By duality, 
exactly one hyperplane will be common to both PI and P2. Let some non-common 
hyperplane H nc reachable from plane PI intersect plane P2 in a line LI, that is, 
H nc n P2 = LI. Then, A c e LI. For if it is not, then 

\L\nA c \ =4 

Also, LI U A c C H nc 

and, LI U A c C P2 

=> \H nc n P2| = 4, a contradiction to lemma [7j Hence, the line LI contains common 
point A c and 2 more out of remaining 6 points of P2 that are not common to PI. 

Exactly 3 hyperplanes of the nature HI, H2, HI + H2 intersect in a 4-dimensional 
subspace, in PG(5,GF(2)). Such a subspace can always be formed by taking union 
of a plane and a line intersecting the plane in a point, by rank arguments. Let the 
common hyperplane to PI and P2 be H c . Let other hyperplanes reachable from PI 
be HI, H2, HI + H c , H2 + H c , HI + H2, H1 + H2 + H c . Then, the pairs of hyperplanes 
(HI, HI + H c ), (H2, H2 + H c ), and (HI + H2, H1 + H2 + H c ), along with H c , form 
3 distinct 4-d subspaces, which leads to 3 distinct lines of meet under plane P2, for 
each pair. This can also be verified from the fact that there are exactly 3 distinct 
lines in plane P2 that have a point A c in common. These 3 lines, and their individual 
unions with PI, leads to reachability from HI, H c , HI + H c , H2, H c , H2 + H c , and 
HI + H2, H c , H1 + H2 + H c , respectively. □ 

Lemma 10. Let there be two hyperplanes HI and H2 meeting in a plane PI. Both 
HI and H2 intersect any plane P2 disjoint from PI in exactly a line, by lemma [?| 
Then these intersecting lines LI (of HI and P2) and L2(of H2 and P2) cannot be the 
same. 

Proof. Let the vector space of a projective geometry flat X be denoted by V(X). 
Flats are sets of points, each of which bijectively corresponds to a 1-d vector in the 
corresponding vector space. Also, closure of a flat (in terms of containing a point) is 
defined as corresponding closure of the vector subspace. Hence, family of substruc- 
tures in a projective space is bijectively intertwined to the family of subspaces in the 
corresponding vector space. Then, if LI = L2 were true, then 

V(L1) = V(L2) (5) 
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where 



Also, it is given that 



V(L1) = V(Hl)nV(P2) (6) 
V(L2) = V(H2)nV(P2) (7) 



V(Hl)nV(H2) = V(P1) (8) 
V(Pl)nV(P2) = (9) 



Hence if one takes closure of set of vectors contained in V(L1) U V(P1) (LI is part 
of P2 which does not intersect with PI), it does generate the entire vector subspace 
V(H1). Similarly, closure of set of vectors contained in V(L2) U V(P1) generates the 
entire vector subspace V(H2). Then from equation [5j the generated subspaces V(H1) 
and V(H2) coincide, a contradiction. □ 



7.2 Lemmas Related to Embedded Graphs 

Lemma 11. In a bipartite graph having 9 vertices each in both partite sets, and having 
a minimum degree(5) of at least 8, any collection of 3 vertices from one side is incident 
on at least 6 common vertices on the other side. 

Proof. Let the vertices of one side be denoted as (al, a2, • • • , a9), and that of other 
side by (bl, b2, ■ ■ ■ , b9). Given 5 = 8, it is obvious that minimal intersection of 
neighborhoods of al and a2 happens in some(at least) 7 vertices from the other side. 
Then the 2 remaining vertices are N(al) - JV(al) f)N(a2) and N(a2) - JV(al) DN(a2), 
respectively. The neighborhood of vertex a3 may either contain all these 7 vertices 
(JV(ol)niV(a2)), or the two vertices N(al)-N(al)f]N(a2) and N(a2)-N(al)nN(a2), 
and at least 6 vertices out of N(al) fl N(a2). Hence the minimal intersection of 
neighborhoods of arbitrarily chosen vertices al, a2 and a3 is of cardinality 6. □ 

Lemma 12. In a bipartite graph having 10 vertices each in both partite sets, and 
having a minimum degree(5) of at least 8, any collection of 3 vertices from one side is 
incident on at least 4 common vertices on the other side. 

Proof. Let the vertices of one side be denoted as (al, a2, • • • , a9), and that of other 
side by (bl, b2, b9). Given 5 = 8, it is obvious that minimal intersection of 
neighborhoods of al and a2 happens in some(at least) 6 vertices from the other side. 
The two vertices in N(al) - N(al) n N(a2) and two more in N(a2) - N(al) n N(a2) 
count the 4 remaining vertices on the other side. The neighborhood of vertex a3 
may either contain all these 6 vertices (N(al) fl N(a2)), or at most all 4 vertices 
N(al) - N(al) n N(a2) and N(a2) - N(al) n N(a2), and at least 4 vertices out of 
N(al) C\N(a2). Hence the minimal intersection of neighborhoods of arbitrarily chosen 
vertices al, a2 and a3 is of cardinality 4. □ 
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Lemma 13. In a bipartite graph having 11 vertices each in both partite sets, and 
having a minimum degree(5) of at least 8, any collection of 3 vertices from one side is 
incident on at least 2 common vertices on the other side. 

Proof. Let the vertices of one side be denoted as (al, a2, • • • , a9), and that of other 
side by (bl, b2, ■ ■ ■ , b9). Given 5 = 8, it is obvious that minimal intersection of 
neighborhoods of al and a2 happens in some(at least) 5 vertices from the other side. 
The three vertices in N(al) - N(al) (lN(a2) and three more in N(a2) -N(al) (lN(a2) 
count the 6 remaining vertices on the other side. The neighborhood of vertex a3 
may either contain all these 5 vertices (JV(al) H N(a2)), or at most all 6 vertices 
N(al) - N(al) n N(a2) and N(a2) - N(al) n N(a2), and at least 2 vertices out of 
N(al) DN(a2). Hence the minimal intersection of neighborhoods of arbitrarily chosen 
vertices al, a2 and a3 is of cardinality 2. □ 



8 Performance of Code for Burst Errors 

The strongest applications for this code lie in the areas of mass data storage such as 
discs. Here, as pointed earlier in section [TJ burst errors are the dominant cause of data 
corruption. Hence we have also examined the burst error correction capabilities of our 
code. 

In bipartite graph G constructed from P(5, GF(2)), we label the edges with integers, to 
map various symbols of the codeword. Such a labeling is not required to understand/ 
characterize the random error correction capability of the code. But here, we label 
the edges with numbers to try to maximize the burst error correction capability. This 
is achieved if each consecutive symbol, possibly part of a burst, is mapped to edges 
that are incident on distinct vertices representing different component decoders. Thus, 
consecutive numbered edges, representing consecutively located symbols in input sym- 
bol stream, go to different vertices hosting different RS decoders. Since there are 63 
vertices on one side of the graph, this scheme of numbering implies that edges incident 
on vertex 1 are assigned the numbers {1, 64, . . . , 1890}. Similarly, the edges incident 
on Vertex 2 are assigned {2, 65, . . . , 1891}, and so on. This numbering essentially 
achieves the effect of interleaving of code symbols. If the error correcting capacity 
of each component RS decoder is n(= |_§J)j then the minimum burst error correcting 
capacity of C will be fi * 63. For example, for e = 5, fi is 2, and the minimum burst 
error correcting capacity is 2 * 63 = 126. Table [4] gives MATLAB simulation results 
for burst error correction for e = 5. 

To demonstrate the excellent burst error correction capacity of our code, we benchmark 
it against the massive interleaving based codes in CD-ROMs. These codes are con- 
sidered to be very robust to burst errors. Traditionally in ECMA-130, the encoding 
utilizes heavy interleaving and dependence on erasure correction to deal with burst 
errors. For erasure correction, one level of decoding identifies the possible locations of 
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No. of errors 


failures 


Avg. no. of iterations 


126 





1 


135 


26 


2.43 



Table 4: Burst errors (e = 5) 



the error symbols. The next level of decoding uses this information to correct them. 
The stage/process of interleaving used in CD-ROMs makes the encoding and decoding 
slower. We propose two schemes in [10] . which offer significant improvement in burst 
error correction at similar data rates. Our decoder, being fully parallel in its decod- 
ing, can handle larger sets of data at a time and hence could be used to increase the 
throughput. In our schemes, however, we wanted to fit our decoders in place of the 
heavy interleaving stage of the CD-ROM decoding data path, which only processes 
one frame at a time. Thus, in terms of throughput we will be matching the CD-ROMs 
but we will surpass them in burst error correction capability. 

9 A Note on Encoding 

For the encoding process, we derive the parity matrix and find its orthogonal matrix 
to get the generator matrix. Suppose d = 5 which means that for each sub-code 4 
edges act as parity symbols. The parity matrix for each vertex is given by: 



/ 1 


a a 2 . 


. a 30 


\ 


1 


a 2 ... . 


. a 60 




V 1 


a 4 ... . 


. a 120 


J 



where a = 2 for us. The parity matrix for the 126 vertices is generated using the 
above parity matrix and inserting the appropriate entries at the corresponding edge 
locations. Each column of the overall parity matrix corresponds to an edge and there 
are 126*4 rows. We then perform row operations to get it in RRE form. The generator 
matrix is then easily obtained. A codeword is given by the product of the message 
vector with the generator matrix. 

The above stated method is not efficient because it uses matrix-vector multiplication 
and hence if of 0(n 2 ). More efficient methods could be possible by utilizing the struc- 
ture of the graph. Getting an efficient encoder design is one of the possibilities of 
future work in this area. 
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10 Results 



For proof of concept, we have done VLSI prototyping of efficient, throughput-optimal 
decoder for the expander-like code as well. The code used was the length-1953 code. 
(31, 25, 7) Reed-Solomon codes were chosen as subcodes, and (63 point, 63 hyperplane) 
bipartite graph from P(5,GF(2)) was chosen as the expander graph. The overall 
expander code was thus (1953, 1197, 761)-code. 

Viewed as a computation graph, every vertex of this graph maps to a RS decoding 
computation. The input symbols to each of these decoders correspond to the edges 
which are incident to the vertex in question. The prototype implementation was done 
on a Xilinx XUPV506 board based on LX110T FPGA, with speed grade of -3. It uses 
the RS decoder IP provided by Xilinx itself. The decoder can be made to skip decoding, 
to accommodate the modification to Zemor's algorithm done by us. We could also use 
it to perform erasure correction, since erasures arise frequently in secondary storage 
device data. Using projective geometry properties, we evolved a novel, throughput- 
optimal strategy to fold the parallel vertex computations, such that the number of RS 
decoders required to implement the decoder for C is only a factor of order of G. This 
saves a lot of resources, and can fit on even small FPGAs. The entire folding strategy 
has been detailed in [5]. 

For such (folded) design, about 25% of the FPGA slices were used to implement the 
decoder. We used distributed RAM to implement the memory modules. The post 
placement-and-routing frequency was found to be 180.83 MHz for the design without 
erasures, and 180.79 MHz for the design with erasures. The design was based on 
e = 5, for which it takes 2611 clock cycles to finish 4 iterations, without erasure 
decoding. Adding 217 clock cycles to write data into the memory, we got a throughput 
of |m * 181 ~ 125Mbytes/ s. More complete details of this implementation, and its 
performance figures such as throughput, can be found in [10J. The efficient design of 
the decoder is patent pending [llj. The error-correction performance by simulating 
this VLSI modeltesting this implementation working on FPGA board is tabulated as 
following. 



e 


Latency 


Processing Delay 


Random errors 


Burst errors 


5 


83 


45 


141 


143 


7 


115 


77 


218 


219 


9 


155 


117 


328 


295 



Table 5: Erasure correction results 
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11 Applications to Storage Media 



Disc storage is a general category of secondary storage mechanisms. Unlike the now- 
obsolete 3.5-inch floppy disk, most removable media such as optical discs do not have 
an integrated protective casing. Hence they are susceptible to data transfer problems 
due to scratches, fingerprints, and other environmental problems such as dust speckles. 
These data transfer problems, while the data is being read, manifests itself in form of 
bit errors in the digital data stream. A long sequence of bit read errors while a track 
is being read (e.g. a scratch on a track) can be characterized as burst error, while bit 
read error arising out of tiny dust speckle masking limited number of pits and lands 
on a track leads to random error. The occurrence of such events obviously not being 
rare, recovery of data to maximum extent in presence of such errors is an essential 
subsystem within most computing systems, such as CPU and disc players. 

11.1 Application to CD-ROM 

As indicated in section [HJ we have also benchmarked the burst error correcting ca- 
pability of our code against CD-ROM decoding based on ECMA-130 [12]. Based on 
this benchmarking, we have proposed 2 novel schemes for CD-ROM encoding and de- 
coding stages. These schemes are based on the expander-like codes described in this 
paper. Application of these codes at various stages of CD-ROM encoding scheme(and 
correspondingly in decoding scheme) substantially increases the burst error correcting 
capability of the disc drive. 

To start with, we recall from [12] that the major part of error correction of the CD- 
ROM coding scheme occurs during the two stages, RSPC and CIRC. On the encoder 
side, RSPC stage comes before CIRC stage, while on decoder side, CIRC stage comes 
before RSPC stage. To get an idea of the average error correction capabilities of 
CDROM scheme, we simulated the CIRC and the RSPC stages of the ECMA standard 
in MATLAB. The details of these stages are as follows. 

CIRC This stage leads to interleaving of codeword symbols. The massive interleaving 
done here is mainly responsible for the burst error correction. In a frame of 
6976(=32*109*2) symbols, it can maximum correct 480 symbols of burst errors. 

RSPC After the CIRC stage during decoding, the remaining errors can be considered 
as random errors. The RSPC stage in decoding then serves to correct these 
errors using RS decoding as erasure decoding. If we consider only error detection 
and correction, the CIRC + RSPC system on an average corrects a burst of 270 
symbols in a frame of 6976 symbols. 

The schemes we propose involve replacing one or both of the CIRC and the RSPC 
with encoders and decoders based on our expander-like codes. This happens in such a 
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way, that we maximize burst error correction, without suffering too much on the data 
rate. 

11.1.1 Scheme 1 

In the first scheme, we replace CIRC+RSPC subsystem with 4 decoders for our 
expander-like codes, C. Hence the RS subcodes used in C have block length d as 
31 (symbols). Further, we fix their minimum distance as e = 7, which also implies that 
their data rate is - = || = 0.806. The output of corresponding 4 encoders is further 
interleaved to improve performance, and de-interleaved on receiving side. 
To construct these subcodes, we take a (255,249,7) RS code, and shorten it by using the 
first 31 8-bit symbols only. For the overall code C, the data rate is equal to (2*r — 1)[4], 
where r is the rate of the (RS) subcode. Hence the rate for codes used in each of our 
encoders/decoders is 2 * 0.806 — 1 = 0.612. Thus, the number of message symbols for 
each decoder is equal to 1953 * 0.612 = 1197. The rest are therefore parity symbols. 
In terms of frames, we set the cumulative input of the encoders, and correspondingly 
the cumulative output of decoders, to a stream of 199 frames, each having 24 symbols 
payload. Assuming that each symbol can be encoded in 1 byte, this leads to generation 
of 4776 bytes. With 12 padding bytes added to it, we can re-partition this bigger set of 
4788 bytes into 4 blocks of 1197 bytes each(4*1197=4788). Each block of 1197 source 
symbols can then be worked upon by 4 parallely working encoders. After encoding 
each block to 1953 symbols, one of the extra added (padding) bytes is removed from 
each encoder giving 244 frames of 32 bytes of data. Every RS decoder has e = 7, 
which implies that it can detect and correct upto 3 errors. Thus, each encoder for C 
will give a burst error correcting capability of 63 * 3 = 189. Since 4 of such encoders 
work in interleaved fashion, we get a minimum burst error detection and correction 
capability of 756 among 244 frames. This is opposed to 270 in 218 frames, in the 
case of CIRC+RSPC subsystem. One can hence clearly see the massive improvement 
in burst error correction, at a comparable code rate(0.62 for our code vs. 0.75 for 
CIRC+RSPC). One disadvantage of this scheme is it being hardware expensive due 
to use of many parallel RS decoders. Also, the high throughput of our decoder is not 
utilized. We are limited by the stages before and after the CIRC+RSPC subsystems(see 
[T2]). Hence, even though our decoder is faster, the advantage is not seen. 

11.1.2 Scheme 2 

This scheme is a more hardware economical scheme, which also increases the burst error 
correcting capability. Since our decoder also has a very good random error correcting 
capability, we can achieve an error correction advantage by replacing just the RSPC 
stage with our encoder/decoder. Two of our encoders can take the place of the RSPC 
encoder in this scheme. Data from these encoders is then interleaved, and passed on to 
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the CIRC. In the decoding stage, after CIRC, there is correspondingly de-interleaving 
followed by decoding based on our code. 

This scheme has the advantage that it increases the error correction capability. It also 
matches the code rate of CIRC: 0.75 for CIRC, versus 0.74 for our decoder. Also, it is 
a hardware economical scheme. MATLAB simulations show that the burst error rate 
goes up from 270 for CIRC+RSPC subsystem, to more than 400 for CIRC and our 
encoder. Tables [6] and [7] show some simulation results for this scheme. 



No. of errors 


failures 


270 


2 


300 


45 


400 


86 



Table 6: Response to Burst errors for CIRC+RSPC 



No. of errors 


failures 


400 


7 


450 


17 


500 


26 



Table 7: Response to Burst errors in Scheme 2 



11.1.3 Meeting Throughput Requirement 

For a 72x CD-ROM read system, the required data transfer rate is 10.8Mbytes/s. 
Recall from section [10] that the decoder for our codes achieved a throughput of ~ 125 
Mbytes/ s. Hence, this decoder using our codes can easily be incorporated without 
hurting throughput. Moreover, in an ASIC implementation, we would expect the 
performance to be better. 



11.2 Application to DVD-R 

The same class of codes can also be applied to evolve encoding and decoding for DVD- 
ROM. The main error correction in DVD-R is provided by the RSPC block |13| . which 
consists of an inner RS(182,172,11) code and an outer RS(208,192,17) code. The inner 
code can detect and correct up to 5 errors, while the outer code can detect and correct 
8 errors. A detailed analysis shows [TU] that without considering erasure decoding, we 
can still detect and correct 2922 errors in a burst, in one block of 37856 bytes. If we 
allow for the inner decoding to mark as erasures, 5834 bytes errors can be corrected. 
As an alternate decoding scheme, we substitute the RSPC stage of the DVD en- 
coding by encoders of code C. These encoders are therefore employed during the 
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transformation of Data Frames into Recording Frames. We can replace the RSPC 
stage with a expander-like code encoder/decoder, made from point-hyperplane graph 
of PG(8, GF(2)), and component RS(255,239,17) codes. The overall burst error cor- 
rection capability without erasures turns out to be 8 * 511 = 4088 bytes, which is 
much greater than 2922. As far as random errors are concerned, MATLAB simulation 
results show that around 1990 random errors are always corrected in one iteration of 
the decoding itself. The existing standard specifies that the number of random errors 
in 8 consecutive ECC blocks must be less than or equal to 280. The complete details 
of this scheme can be found in [10] . 

This particular application of our codes also brings out the fact that taking a bipartite 
graph G from a higher- dimensional projective space can be advantageous in terms of 
better rate and better error correction capacity. 

12 Conclusion 

We have presented the construction and performance analysis for an expander-like 
code that is based on a bipartite graph. The graph, derived from the incidence rela- 
tions of projective spaces, offers unique advantages in terms of deriving lower bounds 
on error correction capabilities. There are also fundamental advantages in terms of 
hardware design of decoder for this code. As the size of the graph increases, practical 
implementation of the decoder becomes difficult. Projective geometry, through lattice 
embedding properties, offers a natural way of efficient folding the computations which 
leads to using fewer processors, while guaranteeing throughput-optimality (5j. 
The error-correction performance of our codes is better than previously stated in lit- 
erature. This is because it relaxes some restrictions that were imposed with respect 
to the second largest eigenvalue of the graph. Derivation of bounds of error correction 
have been presented, and the average case performance of the code is shown to be up 
to 10 times better through simulations. Moreover, the code has special implicit inter- 
leaving due to the numbering of the edges. This offers great advantage in burst error 
corrections. A natural application of these codes with respect to data storage media 
(namely CD-ROMs and DVD-R) has been explored. We have presented schemes that 
improve the burst error performance in comparison to existing standards. 
The hardware design for the decoder has been completely worked out. We use Xilinx RS 
decoder IPs as the processors. The computations have been efficiently folded in order 
to make them fit on a Xilinx LX110T FPGA. We have tested the design on the FPGA 
and also exploited the erasure correction ability of the RS code. Moreover, a general 
folding strategy has been developed for higher dimensions of projective geometry that 
provides a methodology for practically implementing decoders of higher dimensions. 
Overall, we believe that further research should establish even more usefulness of our 
expander-like codes. 
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A Proof for Random Error Capability in case of 

Before going into details of the proof, we need to establish certain cardinalities related 
to PG(5,GF(2)). From the discussion in section [2j we get the following. 

1. Any line in this space is defined by any 2 points. The unique third point lying on 
the lines is determined by linear combination of corresponding one-dimensional 
subspaces. Hereafter, a line will be represented as a tuple (a, b, a+b). 

2. Similarly, or dually, 3 hyperplanes intersect in a particular 4-d projective sub- 
space, or flat. 

3. Any plane in this space is defined by 3 non-collinear points, and their 4 unique 
linear dependencies in the corresponding vector space. Hereafter, a plane will be 
represented as (a, b, c, a+b, b+c, a+c, a+b+c), with the non-canonical choice 
of non-collinear points within this plane being implicit as (a, b, c). 

4. A plane contains 7 lines, thus being a Fano plane. 

5. A plane is reachable from 7 hyperplanes and 7 points in the corresponding lat- 
tice structure through transitive join and meet operations. Such an hourglass 
structure will be critical in our proofs. 

6. Similarly, a hyperplane is reachable from 31 points: 5 of them being independent, 
and others representing the linearly dependent vectors on these. 

A.l Main Proofs 

Proving the two theorems of section |5.1| is done by demonstrating how one can in- 
crementally construct an embedded bipartite subgraph, by improving over minimum 
degree of a smaller subgraph. This requires looking at the planes involved in the 
construction of the embedding. 

Theorem 14. In the construction of bipartite graph mentioned in section \4-S\ there 
exists no embedded subgraph having size of partitions as 9, the degree of each of whose 
vertices has a minimum degree (5) of 8. 
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Proof. Assume that such a subgraph exists. Then by lemma 11, any 3 points have 
at least 6 hyperplanes in common, and vice-versa. By lemma |6j the set of 9 points 
contains at least 3 non-collinear points. The 6 common hyperplanes to them in the 
subgraph must contain the (unique) plane defined by the points. If one of the points is 
contained in some different plane, then by lemma [7| this point can only be reachable 
from maximum 3 hyperplanes belonging to the first plane, and not (all) 6 common 
hyperplanes, a contradiction. Again, from lemma [6j one can pick a subset of 3 inde- 
pendent hyperplanes out of these 6 common hyperplanes. By dual arguments, these 3 
hyperplanes also must have 6 points in common, reachable from a single unique plane 
defined by the 3 hyperplanes. Thus, a 6-a-side bipartite subgraph with reachability 
defined by a single plane of the underlying projective space, is embedded in the 9-a-side 
bipartite subgraph we are trying to construct. 

• Case 1: Out of the remaining 3 points(hyperplanes) in the 9-a-side subgraph, we 
can at most choose 1 point such that it is contained in all the 6 hyperplanes. This 
1 point is the 7 th point on the 7-7 hourglass passing through a single plane. It also 
involves the remaining 7 th hyperplane being incident on all these 6+1 points. The 
remaining 2 points are necessarily part of at least one other plane. This configu- 
ration of 2 remaining points and 2 remaining hyperplanes may maximally form a 
K 2t 2 complete induced subgraph by considering whichever number of intervening 
planes. In terms of their minimum degree, these 2 remaining points, by lemma 
3overlap, can at maximum be reachable from 3 more hyperplanes reachable from 
plane PI. Hence the maximum degree these two points achieve is 5, in any possible 
construction. This contradicts the presence of assumed subgraph having 5 of 8. 

• Case 2: On similar lines, if we choose the 3 remaining points and hyperplanes 
apart from the 6-a-side subgraph to form a complete bipartite subgraph K 3 3 by 
any construction, then again by lemma [7], they can at maximum be reachable from 
3 more hyperplanes, reachable from plane PI. In such they maximally 
achieve a minimum degree of 6, which is still lesser than requirement of 8. 

□ 

Theorem 15. In the construction of bipartite graph mentioned in section there 
exists no embedded subgraph having size of partitions as 10, the degree of each of whose 
vertices has a minimum degree (5) of 8. 



Proof. Assume that such a subgraph exists. Then by lemma 12 any 3 points have 
at least 4 hyperplanes in common, and vice- versa. By lemma |6j the set of 10 points 
contains at least 3 non-collinear points. The 4 common hyperplanes to them in the 
subgraph must contain the (unique) plane defined by the points. If one of the points is 
contained in some different plane, then by lemma [7j this point can only be reachable 
from maximum 3 hyperplanes belonging to the first plane, and not (all) 4 common 
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hyperplanes, a contradiction. Again, from lemma |6j one can pick a subset of 3 inde- 
pendent hyperplanes out of these 4 common hyperplanes. By dual arguments, these 3 
hyperplanes also must have 4 points in common, reachable from a single unique plane 
defined by the 3 hyperplanes. Thus, a 4-a-side bipartite subgraph with reachability 
defined by a single plane of the underlying projective space, is embedded in the 10-a- 
side bipartite subgraph we are trying to construct. 

By considering only one intervening plane, we can maximum go upto 7-a-side sub- 
graph only. Hence to construct 10-a-side graph, we need to consider at least one more 
plane in the construction. We now individually consider the cases where the maximum 
number of points taken from any one of the planes is n: 4 < n < 7. 

• Case 1: Assume that the maximally possible set of 7 points and some number of 
hyperplanes are taken from the plane PI intervening the 4-a-side subgraph already 
present. The number of hyperplanes considered from PI could therefore be any- 
where between 4 and 7. Hence we need to consider at least one more intervening 
plane between the remaining hyperplanes(minimum: 3, maximum: 6) and the 3 
remaining points. In the best possible construction, these remaining hyperplanes 
and points form a biclique. Then, any particular hyperplane from this biclique 
is reachable from a maximum of all 4 points of this biclique, plus at maximum 
3 more points of plane PI; refer lemma [7j Hence the degree requirement of these 
hyperplanes is unsatisfiable using a 7-* partition of the 10-point set, a contradiction. 

• Case 2: Next, assume that 6 points and some number of hyperplanes are taken 
from the plane PI intervening the 4-a-side subgraph already present. The number 
of hyperplanes considered from PI could therefore be anywhere between 4 and 7. 
Hence again we need to consider at least one more intervening plane between the 
remaining hyperplanes(minimum: 3, maximum: 6) and the 4 remaining points. In 
another best possible construction, these remaining hyperplanes and points form 
a biclique. Then, any particular hyperplane from this biclique is reachable from 
a maximum of all 4 points of this biclique, plus at maximum 3 more points of 
plane PI; refer lemma [7} Hence the degree requirement of these hyperplanes is 
unsatisfiable using a 6-* partition of the 10-point set, a contradiction. 

• Case 3: Next, assume that 5 points and some number of hyperplanes are taken 
from the plane PI intervening the 4-a-side subgraph already present. The number 
of hyperplanes considered from PI could therefore be anywhere between 5 and 7. 
Hence again we need to consider at least one more intervening plane between the 
remaining hyperplanes(minimum: 3, maximum: 5) and the 5 remaining points. 

First, we claim that under this subgraph K^^ having one intervening plane 

always exists. To see that, let's take the lone boundary case where 5 points and 
4(minimum) hyperplanes are taken from plane PI, and hence a biclique is not 
provided by incidences of PI. Then, the remaining 6 hyperplanes must belong to 
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plane/s different from PI. By lemma [7J they can at maximum be reachable from 3 
of the 5 points reachable from PI. To satisfy their requirement 5 > 8, they should 
therefore be reachable to all 5 of the remaining points. Hence the set of 6 remaining 
hyperplanes and 5 remaining points form a K e 5 biclique, and hence also K 5t5 . By 
lemma [6] and the fact that a plane formed by 3 independent points is reachable from 
7 hyperplanes, which is simultaneously minimum and maximum, this K 5t5 biclique 
contains exactly 1 intervening plane different from PI. 

By abuse of notation, let's call the plane intervening the always-present K^g sub- 
graph as PI. Then, at maximum 7 hyperplanes can be considered from PI in the 
construction, and hence 3, 4 or 5 hyperplanes and remaining 5 points need to be 
added to K§*, by considering other planes. In case when either 3 or 4 hyperplanes 
are considered from other planes, the set of 5 remaining points cannot have their 
degree requirements satisfied. For, these points can be reachable from maximum 4 
hyperplanes from other planes, and maximum 3 more, considering plane PI (refer 
lemma [7]). Hence we look into the case when 5 hyperplanes, and 5 points are con- 
sidered by looking into other planes. 

In this case, each hyperplane out of 5 remaining hyperplanes needs to be reach- 
able from 3 different points reachable from PI. These 3 different points cannot 
be independent, for if they were, then the corresponding hyperplane would have 
been on PI rather than any other plane. Hence each one out of 5 such collections 
of 3 points forms a line. Given a plane in PG(5,GF(2)) having its point set as 
(a,b,c,a+b,b+c,a+c,a+b+c), it is immediately obvious that no subset of 5 points 
contains 5 different lines. In fact, to have 5 different lines contained in some point 
subset, the minimum size of the subset required is 7. Hence all possible construc- 
tions in this case leaves at least one hyperplane not having its degree requirement 
satisfied, a contradiction. 

• Case 4: Finally, assume that 4 points and some number of hyperplanes are taken 
from the plane PI intervening the 4-a-side subgraph already present. The number 
of hyperplanes considered from PI could therefore be anywhere between 4 and 7. 
Hence again we need to consider at least one more intervening plane between the 
remaining hyperplanes(minimum: 3, maximum: 6) and the 6 remaining points. We 
consider following two cases. 

In this case, we assume that the remaining 6 points contain at least one subset 
of size 3 forming a line. At least one point out of 3 remaining points of this 6-set 
will be not be part of this line(a line has maximum 3 points in PG(5, GF(2))). 
This point and the line therefore form a plane P2, which is maximal, by our 
assumption in Case 4. Since in this case, any 3 points will have at least 4 
common hyperplanes to satisfy their degree requirements, the 3 independent 
points of plane P2 will have 4 hyperplanes in common, and vice-versa. The 
remaining 2 hyperplanes, hereafter referred as HI and H2, do not contain both 
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PI and P2. More formally, by lemmas [9] and 10, the best case occurs when 

H i PI P i = a line, for i, j=l and 2 
Hence these hyperplanes can be reachable from a maximum of 6 points reachable 
from PI and P2. In fact, they need to be reachable from exactly 6 to satisfy 
their degree requirements. This reachability from 6 points by each of the 2 
hyperplanes must consist of reachability from one line each from planes PI and 
P2. 

By lemmas [9] and 10, the intersecting lines of HI and H2 to say, PI, cannot be 
the same. In a plane of PG(5,GF(2)) denoted as (a,b,c,a+b,b+c,a+c,a+b+c), 
one can clearly see that to accommodate 2 different lines, one needs to consider 
a subset of at least 5 points. This contradicts our construction in which we 
originally had 2 collections of 4 points each contained in two planes. 

o In this case, we assume that the remaining 6 points do not contain any subset 
of size 3 which is dependent. This means that any subset of triplet of points 
from among these define a plane. So we will arbitrarily consider two disjoint 
triplets from this 6 remaining points, and consider their planes P2 and P3. Also 
note that points from triplet of P2 do not lie on P3, and vice- versa. For, we 
are assuming in this sub-case that 4 coplanar points do not exist among these 6 
remaining points. Again in this case, any 3 points will have at least 4 common 
hyperplanes to satisfy their degree requirements, and vice-versa. Hence we again 
have 4 points and hyperplanes being considered in construction, for each of the 
planes PI, P2 and P3. Note that there may be common points and hyperplanes 
used in this construction. We consider 2 separate sub-subcases 

* In this case, all the 10 hyperplanes of the required graphs lie within the set 
of union of sets of 4 hyperplanes each being considered per plane. 

Consider the 3 independent points of the triple of plane P2. None of these 
points can be same, or linear combination of any point reachable from plane 
PI. These 3 points were earlier also illustrated to be independent, and hence 
not reachable from P3 as well. A join of plane PI and one different point 
reachable from plane P2 in the corresponding lattice yields one different 3-d 
flat each, in the lattice. With respect to plane PI where we are considering 
a set of 4 hyperplanes, it is obvious that in PG(5, GF(2)), at least 3 of these 
hyperplanes are independent. If the remaining hyperplane is dependent on 
2 of these 3, then a complete 3-d flat is reachable from this set of 4 hyper- 
planes. By lemma [7], any point of P2 is reachable from at most 3 of the 
hyperplanes reachable from PI, that too when the 3 hyperplanes are part 
of a complete 3-d flat. Hence it is possible that a particular point reachable 
from P2 is also reachable from the unique 3-d flat, whenever it exists, and 
thus to the 3 hyperplanes of PI from this 3-d flat. Since the join of different 
independent points of P2 with plane PI gives rise to different 3-d flats, and 
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since there is at most one 3-d flat embedded in the set of 4 hyperplanes 
being considered w.r.t. plane PI, at least two points reachable from plane 
P2 can only be reachable from at most 2 hyperplanes reachable from plane 
PI. A similar conclusion can be reached with respect to the same 3 points 
of plane P2, and the hyperplanes reachable from P3, that are under consid- 
eration for this construction. In the best case, 2 distinct points out of the 
3 points reachable from P2 have a degree of 3 towards planes PI and P3, 
respectively. Hence at least l(the remaining one) point reachable from plane 
P2 is reachable from at most two hyperplanes each, reachable from planes 
PI and P3. A similar point can also be located on plane P3, the 3 points 
reachable from which have identical relation to the point sets of remaining 
2 planes. 

Without loss of generality, we further claim that at most 3 of the 10 hyper- 
planes used in construction remain outside of those considered for planes PI 
and P3(or P2). When planes PI and P3 are disjoint, they cover 8 of the 
10 required hyperplanes of the construction. If the planes meet in a point, 
then by duality arguments, they also meet in a hyperplane. In which case, 
they cover 7 out of 10 required hyperplanes of the construction. If PI and 
P3 meet in a line (the last possible scenario), then P3 has 1 point exclusively 
belonging to itself. Hence in all cases, either P2 or P3 has at most 3 points 
lying on it. On both these planes, we have located at least 1 point, which 
is reachable from at most 2 points each to the remaining 2 planes. Hence 
in all scenarios, there exists at least one point, which can be reachable from 
at most 3+2+2 = 7 hyperplanes, thus invalidating the construction of this 
case. 

* In this case, all the 10 hyperplanes of the required graphs do not lie within 
the set of union of sets of 4 hyperplanes each being considered per plane. 
Hence, there is at least 1 hyperplane that is not covered by planes PI, P2 
and P3, i.e. it is not reachable from either of these. By lemma [7] and the 
fact that in the assumption for this case, the triplet of points of P2 and 
P3 are not collinear, one can see that the points of the planes P2 and P3 
provide degree at most 2 each to the above hyperplane. Additionally, it may 
provide degree of /be reachable from maximum 3 points lying on plane PI. 
By considering the planes PI, P2 and P3, we have exhausted all the points 
of the construction, and the maximum degree this particular hyperplane has 
achieved so far is 3+2+2 = 7, that is clearly not sufficient. 

□ 
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B An Eigenvalue-based Approach for Deriving £ 



This section provides an easier, alternative approach towards deriving weaker upper 
bounds on random error correction capability of our code. This approach can be used 
for the extreme cases when one has to consider the code rate that is less than 0.1 
(equivalently, e > 15). The arguments in this approach are very similar to the ones 
given by j3]. Let A = (ay) be the 2n x 2n adjacency matrix of the bipartite graph 
G(V, E) of degree d. That is, = 1 if there is an edge between the vertices indexed 
by i and j, and = otherwise. Let § be the set of vertices of the graph G that form 
the minimal configuration of failure. Let x s be a column vector of size 2n such that 
every coordinate indexed by a vertex of § equals 1 and the other co-ordinates equal 0. 
Now, we have, 

xjAx s = ]T<5 Gs (v) (10) 

where Sg s (v) is the degree of vertex v in the subgraph G§ induced by the vertex set S 
in G. 

Let j be the all-ones vector, j is the eigenvector of A associated with the eigenvalue d. 
Define y s such that 

I S I. 

x s = ~7) — j + y s (ii) 

y s has | § | co-ordinates equal to 1 — ]p and 2n— | S> | co-ordinates equal to — ^ and 
y s is orthogonal to j. Therefore, we can write 

I S I 2 



AXs = ~^? d '^ + YsAys 



Since j.j = 2n, we have, 



i2 



x s T Ax s = ^ci + y s T Ay s (12) 

Now, y s is orthogonal to j and the eigenspace associated to the eigenvalue d is of 
dimension 1 (G is connected). Therefore, we have, y^Ay s < A|| y s || 2 where A is the 
second largest eigenvalue of A. Considering the structure of y s as explained above, we 
have, 

"V + (2n-|§|)(^) 2 



In' 2n 



2n 



Since we are looking for subgraphs in which the degree of each vertex is at least a 
certain value (say 7), after combining the various equations and inequalities above, we 
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get, 

7 | § | < xjAx, 



s 

2 

T 



2n 

o |2 



d + y s Ay s 



< + A||y s || 2 



2n 

|2 



d + A(|S 



2n " 1 2n ' 

Finally, since | § |> 0, we can cancel it from both sides and the expression that we 
arrive at is, 

|S|>4^i (13) 
Because of duality of points and hyperplanes, we get £ = Thus, 

71(7 — A) . , 

« a -fcr (14 > 

The above formula is also stated in pQ in the context of finding the minimum distance 
of the code proposed by him using PG(2, q). In our case, the above formula should be 
used only for e > 15. For all practical values of e (< 15), the combinatorial methods 
detailed earlier give a very tight bound. 
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